The behaviour of random forest permutation-based variable importance measures under predictor correlation
نویسندگان
چکیده
منابع مشابه
Random Forest variable importance with missing data
Random Forests are commonly applied for data prediction and interpretation. The latter purpose is supported by variable importance measures that rate the relevance of predictors. Yet existing measures can not be computed when data contains missing values. Possible solutions are given by imputation methods, complete case analysis and a newly suggested importance measure. However, it is unknown t...
متن کاملLetter to the Editor: Stability of Random Forest importance measures
The goal of this article (letter to the editor) is to emphasize the value of exploring ranking stability when using the importance measures, mean decrease accuracy (MDA) and mean decrease Gini (MDG), provided by Random Forest. We illustrate with a real and a simulated example that ranks based on the MDA are unstable to small perturbations of the dataset and ranks based on the MDG provide more r...
متن کاملEM-random forest and new measures of variable importance for multi-locus quantitative trait linkage analysis
MOTIVATION We developed an EM-random forest (EMRF) for Haseman-Elston quantitative trait linkage analysis that accounts for marker ambiguity and weighs each sib-pair according to the posterior identical by descent (IBD) distribution. The usual random forest (RF) variable importance (VI) index used to rank markers for variable selection is not optimal when applied to linkage data because of corr...
متن کاملVariable Importance Assessment in Regression: Linear Regression versus Random Forest
Relative importance of regressor variables is an old topic that still awaits a satisfactory solution. When interest is in attributing importance in linear regression, averaging over orderings methods for decomposing R2 are among the state-of-theart methods, although the mechanism behind their behavior is not (yet) completely understood. Random forests—a machinelearning tool for classification a...
متن کاملA Permutation Importance-Based Feature Selection Method for Short-Term Electricity Load Forecasting Using Random Forest
The prediction accuracy of short-term load forecast (STLF) depends on prediction model choice and feature selection result. In this paper, a novel random forest (RF)-based feature selection method for STLF is proposed. First, 243 related features were extracted from historical load data and the time information of prediction points to form the original feature set. Subsequently, the original fe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: BMC Bioinformatics
سال: 2010
ISSN: 1471-2105
DOI: 10.1186/1471-2105-11-110